xen.git
16 years agoAMD IOMMU: Rework of interrupt remapping
Keir Fraser [Wed, 16 Sep 2009 08:21:56 +0000 (09:21 +0100)]
AMD IOMMU: Rework of interrupt remapping

1) Parsing IVRS special device entry in order to handle ioapic
remapping correctly.
2) Allocating per-device interrupt remapping tables instead of using a
global interrupt remapping table.
3) Some system devices like io-apic for north-bridge cannot be
discovered during pci device enumeration procedure. To remap interrupt
of those devices, device table update is split into 2 steps, so
that interrupt tables can be bound to device table entry earlier than
I/O page tables.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agox86: irq ratelimit
Keir Fraser [Wed, 16 Sep 2009 08:16:38 +0000 (09:16 +0100)]
x86: irq ratelimit

This patch adds the feature of irq ratelimit. It temporarily masks
the interrupt (guest) if too many irqs are observed in a short
period (irq storm), to ensure responsiveness of Xen and other guests.

As for now, the threshold can be adjusted at boot time using command-
line option irq_ratelimit=xxx.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 hvm: Guests should scan CPUID range 40000000-4000ff00 for Xen leaves.
Keir Fraser [Wed, 16 Sep 2009 07:55:23 +0000 (08:55 +0100)]
x86 hvm: Guests should scan CPUID range 40000000-4000ff00 for Xen leaves.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxenoprof: force use of architectural perfmon instead of the CPU
Keir Fraser [Tue, 15 Sep 2009 09:08:12 +0000 (10:08 +0100)]
xenoprof: force use of architectural perfmon instead of the CPU
specific event set, which may be not supported by oprofile user space
tool yet.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agoxenoprof: support Intel's architectural perfmon registers.
Keir Fraser [Tue, 15 Sep 2009 09:03:16 +0000 (10:03 +0100)]
xenoprof: support Intel's architectural perfmon registers.

One benefit is that more perfmon counters can be used on Nehalem.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agoxenoprof: add support for Core i7 and Atom.
Keir Fraser [Tue, 15 Sep 2009 09:02:15 +0000 (10:02 +0100)]
xenoprof: add support for Core i7 and Atom.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agox86: Free unused pages of per-cpu data.
Keir Fraser [Tue, 15 Sep 2009 08:54:16 +0000 (09:54 +0100)]
x86: Free unused pages of per-cpu data.

As well as freeing data pages for impossible cpus, we also free pages
of all other cpus which contain no actual data (because of too-large
statically-defined PERCPU_SHIFT).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Re-increase size of percpu area
Keir Fraser [Tue, 15 Sep 2009 08:52:26 +0000 (09:52 +0100)]
x86: Re-increase size of percpu area

Per-cpu vector code add a lot of percpu data. Together with perfc
enabled, one page per cpu is not enough any more.

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agop2m: Fix debug build.
Keir Fraser [Tue, 15 Sep 2009 08:46:08 +0000 (09:46 +0100)]
p2m: Fix debug build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoUpdate QEMU_TAG to 3de6cb51b19c46967cbc88ceb202b240c736eeca
Keir Fraser [Tue, 15 Sep 2009 08:26:52 +0000 (09:26 +0100)]
Update QEMU_TAG to 3de6cb51b19c46967cbc88ceb202b240c736eeca

16 years agoxend: Fix VDI.get_record
Keir Fraser [Tue, 15 Sep 2009 08:26:08 +0000 (09:26 +0100)]
xend: Fix VDI.get_record

We cannot get correct records of VDI by VDI.get_record.
The correct records of VDI are gotten by this patch.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86 mce: Fix panic in mcheck_mca_logout
Keir Fraser [Tue, 15 Sep 2009 08:25:41 +0000 (09:25 +0100)]
x86 mce: Fix panic in mcheck_mca_logout

I met the following panic message in mcheck_mca_logout().
MSR_IA32_MCi_ADDR might take the values other than the machine
address. FATAL PAGE FAULT occured when the non-existent address is
passed to maddr_get_owner().

Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
16 years agoVt-d: queued invalidation cleanup
Keir Fraser [Tue, 15 Sep 2009 08:24:59 +0000 (09:24 +0100)]
Vt-d: queued invalidation cleanup

This patch cleans up queued invalidation, including round wrap
check, multiple polling status and other minor changes. This version
uses local variable as the polling address, which is clean.

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
16 years agox86: Remove PSE flag from PV guest CR4 and CPUID.
Keir Fraser [Tue, 15 Sep 2009 08:23:44 +0000 (09:23 +0100)]
x86: Remove PSE flag from PV guest CR4 and CPUID.

From: Dave McCracken <dcm@mccr.org>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agopygrub: Correct pygrub return value
Keir Fraser [Tue, 15 Sep 2009 08:21:34 +0000 (09:21 +0100)]
pygrub: Correct pygrub return value

This is the patch to correct pygrub return value for checkPassword()
function. It didn't return False at the end of the function. It
returned None so it was working fine and it's most likely just a
cosmetic issue.

Also, the missing () were added to checkPassword() function when
calling hasPassword and the unnecessary comment was removed.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoxend: Receive error message of migration from destination server
Keir Fraser [Tue, 15 Sep 2009 08:20:47 +0000 (09:20 +0100)]
xend: Receive error message of migration from destination server

The following error message was shown by xm migrate command.
In fact, I caused the command error by intention.  I prepared a
destination server where free memory was insufficient, and then
I tried to migrate a VM to the destination server.  As I had
expected, the command error occurred.  However the error message
was different from my expectation.  I would like to show an error
message from the destination server if an error occurred on the
destination server.

# xm migrate --live vm3 bx339
Error: (107, 'Transport endpoint is not connected')
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=3Dportnum, --port=3Dportnum
                     Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

If a destination server sends an error message, this patch shows=20
the error message.  For example, the following error message is=20
shown if free memory of the destination server is insufficient.

# xm migrate --live vm3 bx339
Error: I need 262144 KiB, but dom0_min_mem is 716800 and shrinking
to=20
716800 KiB would leave only 50368 KiB free. (from bx339)
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=3Dportnum, --port=3Dportnum
                     Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoReplace magic number for NULL (~0) with PAGE_LIST_NULL
Keir Fraser [Tue, 15 Sep 2009 08:19:23 +0000 (09:19 +0100)]
Replace magic number for NULL (~0) with PAGE_LIST_NULL
...in the page_list_* functions.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoblktap2: Fix off-by-one error in driver lookup
Keir Fraser [Tue, 15 Sep 2009 08:16:52 +0000 (09:16 +0100)]
blktap2: Fix off-by-one error in driver lookup

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoPoD: Implement PoD for EPT
Keir Fraser [Tue, 15 Sep 2009 08:16:19 +0000 (09:16 +0100)]
PoD: Implement PoD for EPT

This patch implements the populate-on-demand functionality for EPT.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agop2m: Reorganize p2m_pod_demand_populate in preparation for EPT PoD patch
Keir Fraser [Tue, 15 Sep 2009 08:15:14 +0000 (09:15 +0100)]
p2m: Reorganize p2m_pod_demand_populate in preparation for EPT PoD patch

p2m_pod_demand_populate is too non-EPT-p2m-centric.  Reorganize code
to have a p2m-specific call that wraps a generic PoD call.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoEPT: Clean up some code
Keir Fraser [Tue, 15 Sep 2009 08:14:36 +0000 (09:14 +0100)]
EPT: Clean up some code

Clean up and reorganize some code in preparation for adding
populate-on-demand functionality.

Should be no functional difference.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoPoD: Check p2m assumption in debug builds
Keir Fraser [Tue, 15 Sep 2009 08:13:38 +0000 (09:13 +0100)]
PoD: Check p2m assumption in debug builds

The PoD code assumes that if:
* A page is in a domain's p2m table
* And it's owned by the domain
* And it's not a xenheap page
then:
* It's on the domain's page list.

This patch adds a check for this assumption when debug=y.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoPoD: Fix debug build.
Keir Fraser [Tue, 15 Sep 2009 08:13:01 +0000 (09:13 +0100)]
PoD: Fix debug build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoPoD: Don't reclaim xenheap pages in zero-sweep
Keir Fraser [Tue, 15 Sep 2009 08:09:18 +0000 (09:09 +0100)]
PoD: Don't reclaim xenheap pages in zero-sweep

Don't reclaim xenheap-allocated pages in the zero-sweep.  This avoids
grabbing things like grant tables mapped in the p2m.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoPoD: Scrub pages before adding to the cache
Keir Fraser [Tue, 15 Sep 2009 08:08:36 +0000 (09:08 +0100)]
PoD: Scrub pages before adding to the cache

Neither memory from the allocator nor memory from
the balloon driver is guaranteed to be zero.  Scrub it
before adding to the cache.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agopassthrough: remove pointless error checks
Keir Fraser [Tue, 15 Sep 2009 08:06:46 +0000 (09:06 +0100)]
passthrough: remove pointless error checks

map_domain_page() cannot return NULL. And if it could, both instances
changed here would leak memory in such a case.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: add an extra check when validating a huge pv L2 entry
Keir Fraser [Wed, 9 Sep 2009 15:39:41 +0000 (16:39 +0100)]
x86: add an extra check when validating a huge pv L2 entry

While get_page_and_type_from_pagenr() (through get_page_from_pagenr())
does the needed mfn_valid() check, get_data_page() doesn't and, it
being passed a struct page_info pointer, really expects it's caller(s)
to do.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix an obviously inverted check in offline_page()
Keir Fraser [Wed, 9 Sep 2009 15:32:25 +0000 (16:32 +0100)]
Fix an obviously inverted check in offline_page()

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoFix typo in c/s 20158:f9ce5858.
Keir Fraser [Wed, 9 Sep 2009 14:34:37 +0000 (15:34 +0100)]
Fix typo in c/s 20158:f9ce5858.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxm,xend: Make cpus parameter available
Keir Fraser [Wed, 9 Sep 2009 14:33:30 +0000 (15:33 +0100)]
xm,xend: Make cpus parameter available

When I started a VM by using xm create command, cpus parameter in VM
configuration files was ignored.  The problem occurred only when I
used XenAPI. This patch makes the parameter available.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agomount /proc/xen in init.d/xen
Keir Fraser [Wed, 9 Sep 2009 14:32:30 +0000 (15:32 +0100)]
mount /proc/xen in init.d/xen

pvops dom0 kernels have a separate xenfs which has to be mounted on
/proc/xen.  Systems with older configurations don't have xenfs listed
in fstab, and it can sometimes make sense to keep it that way (for
example, if the dom0 wants to boot a native-only kernel too).

The attached patch to the script which ends up in /etc/init.t/xend
mounts /proc/xen if it appears to be necessary.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agox86: Fix typo in p2m_pod_set_cache_target
Keir Fraser [Tue, 8 Sep 2009 14:11:52 +0000 (15:11 +0100)]
x86: Fix typo in p2m_pod_set_cache_target

Fix typo in p2m_pod_set_cache_target by defining (1<<9) as
SUPERPAGE_PAGES

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoUpdate QEMU_TAG to 2836e73adcd994de071f4eec1aa538a5ca849118
Keir Fraser [Tue, 8 Sep 2009 14:11:18 +0000 (15:11 +0100)]
Update QEMU_TAG to 2836e73adcd994de071f4eec1aa538a5ca849118

16 years agoxend: Fix syntax error
Keir Fraser [Tue, 8 Sep 2009 14:10:59 +0000 (15:10 +0100)]
xend: Fix syntax error

Signed-off-by: Simon Horman <horms@verge.net.au>
16 years agoVT-d: prevent dom0 to use VT-d HW
Keir Fraser [Tue, 8 Sep 2009 14:10:31 +0000 (15:10 +0100)]
VT-d: prevent dom0 to use VT-d HW

pv-ops dom0 contains Linux upstream VT-d driver, and will go to enable
it when VT-d is set in kernel config file. It should not enable VT-d
in dom0.

Currently it already zaps ACPI DMAR signature to prevents dom0 using
VT-d HW when VT-d is enabled for Xen. But when VT-d is not enabled for
Xen, and VT-d is set in pv-ops kernel config file, pv-ops dom0 will go
to enable it. This will results in pv-ops dom0 booting failure. This
patch prevents dom0 to use VT-d HW whether VT-d is enabled or disabled
for Xen.

Signed-off-by: Weidong Han <weidong.han@intel.com>
16 years agoFix etags invocation
Keir Fraser [Mon, 7 Sep 2009 13:26:06 +0000 (14:26 +0100)]
Fix etags invocation

Don't fail in the case where 'etags' isn't Exuberant Ctags

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoxend: passthrough: add an option pci-passthrough-strict-check
Keir Fraser [Mon, 7 Sep 2009 12:52:48 +0000 (13:52 +0100)]
xend: passthrough: add an option pci-passthrough-strict-check

Currently when assigning device to HVM guest, we use the strict check
for HVM guest by default.(For PV guest we use loose check
automatically if necessary.)

When we assign device to HVM guest, if we meet with the co-assignment
issues or the ACS issue (see changeset 20081: 4a517458406f), we could
try changing the option to 'no' -- however, we have to realize this
may incur security issue and we can't make sure the device assignment
could really work properly even after we do this.

The option is located in /etc/xen/xend-config.sxp:
(pci-passthrough-strict-check yes)

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: don't treat IOAPIC RTE of dest_SMI type specially.
Keir Fraser [Mon, 7 Sep 2009 12:52:17 +0000 (13:52 +0100)]
vt-d: don't treat IOAPIC RTE of dest_SMI type specially.

We also need to create IRTE for it since we enable EIM and clear CFI,
or else, the IOAPIC RTE's interrupt message would be blocked by IR unit.

In io_apic_read_remap_rte(), we now use
"apic_pin_2_ir_idx[apic][ioapic_pin]"
rather than "(remap_rte->index_15 << 15) | remap_rte->index_0_14" to
avoid the "interrupt remapping table out of bound error".

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: some small fixes to apic_pin_2_ir_idx
Keir Fraser [Mon, 7 Sep 2009 12:51:55 +0000 (13:51 +0100)]
vt-d: some small fixes to apic_pin_2_ir_idx

1) apic_pin_2_ir_idx should be int** rahter than unsigned int**,
because we use the int -1 to indicate that the related IRTE index is
not allocated.
2) shouldn't re-init apic_pin_2_ir_idx when resuming from S3.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agox86: Some cleanups for apic_write, apic_read, apic_wrmsr, apic_rdmsr
Keir Fraser [Mon, 7 Sep 2009 12:51:37 +0000 (13:51 +0100)]
x86: Some cleanups for apic_write, apic_read, apic_wrmsr, apic_rdmsr

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: replace the gdprintk with dprintk since it isn't in guest context.
Keir Fraser [Mon, 7 Sep 2009 12:51:19 +0000 (13:51 +0100)]
vt-d: replace the gdprintk with dprintk since it isn't in guest context.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agopygrub: trap exception when python module import fails
Keir Fraser [Mon, 7 Sep 2009 12:50:55 +0000 (13:50 +0100)]
pygrub: trap exception when python module import fails

Fix the issue when importing 'crypt' module or crypt.crypt fails in
pygrub. The exception is written on the same line like "Failed!"
message but only if there is an exception. If there is no exception,
we don't bother users with details (probably the password they entered
was wrong) so we just display "Failed!" message. Also, the code for
hasPassword() was rewritten not to have try/except block here.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agovt-d: avoid obtaining iommu->register_lock too early in
Keir Fraser [Mon, 7 Sep 2009 12:49:35 +0000 (13:49 +0100)]
vt-d: avoid obtaining iommu->register_lock too early in
dma_msi_set_affinity()

If set_desc_affinity() fails, the current code doesn't release the
spinlock. We should obtain the lock at a later place.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agoxend: Enable to set config variables in /etc/sysconfig/xend
Keir Fraser [Mon, 7 Sep 2009 08:00:21 +0000 (09:00 +0100)]
xend: Enable to set config variables in /etc/sysconfig/xend

The attached patch enables to set the environment variables for xend
in /etc/sysconfig/xend.

There are four variables.

XENCONSOLED_TRACE=3D[none|guest|hv|all]
XENSTORED_ROOTDIR=3D/var/lib/xenstored
XENSTORED_TRACE=3D[yes|on|1]
XENBACKENDD_DEBUG=3D[yes|on|1]

The XENCONSOLED_TRACE and XENSTORED_ROOTDIR take strings for each
command's options. And if thease variables have non-zero strings, then
export them.
If the XENSTORED_TRACE and XENBACKENDD_DEBUG take either "yes", "on"
or "1" then export them.

From: Kazuhiro SUZUKI <kaz@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Revert c/s 17536 which breaks PV passthru of MSI-X devices.
Keir Fraser [Mon, 7 Sep 2009 07:48:12 +0000 (08:48 +0100)]
xend: Revert c/s 17536 which breaks PV passthru of MSI-X devices.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAdd the support of x2apic logical cluster mode.
Keir Fraser [Mon, 7 Sep 2009 07:46:46 +0000 (08:46 +0100)]
Add the support of x2apic logical cluster mode.
Add a xen boolean parameter 'x2apic'.
Add a xen boolean parameter 'x2apic_phys'(by default, we use logical
cluster mode).

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: use 32-bit Destination ID when Interrupt Remapping with EIM is
Keir Fraser [Mon, 7 Sep 2009 07:46:03 +0000 (08:46 +0100)]
vt-d: use 32-bit Destination ID when Interrupt Remapping with EIM is
enabled

When x2APIC and Interrupt Remapping(IR) with EIM are enabled, we
should use 32-bit Destination ID for IOAPIC and MSI.

We implemented the IR support in xen by hooking the functions like
io_apic_write(),io_apic_modify(), write_msi_message(), and as a
result, in the hook functions in intremap.c, we can only see the 8-bit
dest id rather the 32-bit id, so we can't set IR table Entry that
requires a 32-bit dest id.

To solve the issue throughly, we need find every place in io_apic.c
and msi.c that could write ioapic RTE and and device's msi message and
explicitly handle the 32-bit dest id carefully (namely, when genapic
is x2apic, cpu_mask_to_apic could return a 32-bit value); and we have
to change the iommu_ops->{.update_ire_from_apic, .update_ire_from_msi}
interfaces. We may have to write an over-1000-LOC patch for this.

Instead, we could use a workround:
1) for ioapic, in the struct IO_APIC_route_entry, we could use a new
"dest32" to refer to the dest field;
2) for msi, in the struct msi_msg, we could add a new "u32 dest".
And in intremap.c, if x2apic_enabled, we use the new names to refer to
the dest fields.

We can improve this in future.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: enhance the support of Interrupt Remapping EIM and x2APIC
Keir Fraser [Mon, 7 Sep 2009 07:44:50 +0000 (08:44 +0100)]
vt-d: enhance the support of Interrupt Remapping EIM and x2APIC

1) Clear Interrupt Remapping(IR) unit's CFI (Compatibility Format
Interrupt) to enhance security;
2) Move the iommu_setup() ahead and put it before we begin to use
IOAPIC so we can make sure after we enable Interrupt Remapping, the
later IOAPIC (and MSI) initialization would setup IOAPIC RTEs (and
MSI) with remappable format;
3) Enable x2APIC only when all VT-d engines support IR with EIM
(Extended Interrupt Mode). EIM enables external devices to deliver
interrupts to logical processor with >8-bit APIC ID.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agox86/mmcfg: misc adjustments
Keir Fraser [Mon, 7 Sep 2009 07:44:00 +0000 (08:44 +0100)]
x86/mmcfg: misc adjustments

- fix the mapping range (end_bus_number is inclusive)
- fix the mapping base address (shifting segment by 22 was set for
  overlapping mappings; assuming the goal was to reduce the virtual
  space used when less than 256 busses are present on all segments,
  adding logic to determine the smallest possible shift value)
- fix PCI_MCFG_VIRT_END, and actually use it to avoid creating
- mappings
  outside the designated range
- fix address calculations (segment numbers must be converted to long
  to avoid truncation)
- add a way (command line option) to suppress the use of mmconfig as
  well as to actually use the AMD Fam10 special code
- correct __init annotations
- use xmalloc()/xmalloc_array() in favor of xmalloc_bytes()
- eliminate dead code and data

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoamd iommu: Remove a useless flag and fix I/O page fault for hvm
Keir Fraser [Mon, 7 Sep 2009 07:43:14 +0000 (08:43 +0100)]
amd iommu: Remove a useless flag and fix I/O page fault for hvm
passthru devices.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoamd iommu: Cleanup initialization functions and fix a fatal page fault
Keir Fraser [Mon, 7 Sep 2009 07:42:50 +0000 (08:42 +0100)]
amd iommu: Cleanup initialization functions and fix a fatal page fault
caused by out-of-bounds access to irq_to_iommu array.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agox86-64/mmcfg: add explicit support for nVidia MCP55
Keir Fraser [Mon, 7 Sep 2009 07:41:45 +0000 (08:41 +0100)]
x86-64/mmcfg: add explicit support for nVidia MCP55

This is a simple port from Linux 2.6.31-rc8.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: convert frame_table to a #define
Keir Fraser [Mon, 7 Sep 2009 07:41:00 +0000 (08:41 +0100)]
x86: convert frame_table to a #define

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoTidy evtchn keyhandler a little
Keir Fraser [Mon, 7 Sep 2009 07:40:33 +0000 (08:40 +0100)]
Tidy evtchn keyhandler a little

Get rid of all the -1s and label the pending and masked columns.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoxend: passthrough: fix physdev_map_pirq invocation
Keir Fraser [Mon, 7 Sep 2009 07:38:39 +0000 (08:38 +0100)]
xend: passthrough: fix physdev_map_pirq invocation

For those devices not having INTx (like VFs), avoid calling map_pirq,
otherwise the guest cannot be started successfully.

Also avoid calling this hypercall for hvm guest, this is done in the
device model.

Signed-off-by: Qing He <qing.he@intel.com>
16 years agoFix some issues for HVM log dirty:
Keir Fraser [Mon, 7 Sep 2009 07:37:58 +0000 (08:37 +0100)]
Fix some issues for HVM log dirty:
* Add necessary logging dirty in qemu to avoid guest error with
intensive disk access when live migration
* Take place of shared memory between qemu and migration tools by new
added hypercall, which is clean and simple

Signed-Off-By: Zhai, Edwin <edwin.zhai@intel.com>
16 years agox86: Fix PoD cache size when decreasing memory
Keir Fraser [Fri, 4 Sep 2009 07:43:05 +0000 (08:43 +0100)]
x86: Fix PoD cache size when decreasing memory

Certain paths through p2m_pod_decrease_reservation() fail to reduce
the size of the PoD cache if the number of outstanding entries is less
than the size of the cache.  Rearrange so this doesn't happen.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoxend: Support "bootloader" mode for "drbd:" devices
Keir Fraser [Fri, 4 Sep 2009 07:42:10 +0000 (08:42 +0100)]
xend: Support "bootloader" mode for "drbd:" devices

To be able to use "bootloader" on drbd devices the following changes
need to be made:

*) Translation of devicename

_parse_uname which is used by blkdev_uname_to_file which is again used
by _configureBootloader in XendDomainInfo needs to be able to resolve
drbd resources to the corresponding blockdevice to feed to the
configured bootloader.

*) Activation of drbd device

If the drbd device isn't in Primary mode when the bootloader tries to
fetch the kernel and initrd, the start of the DomU will fail. To
prevent this the given drbd device will be made Primary before the
bootloader gets executed.

A note on the naming of drbd resouces: drbd uses mostly resource names
in it's userland tools. Because of that drbd VBDs, if configured with
the "drbd:" type, should always use the drbd resource name as
suggested by the drbd documentation at
http://www.drbd.org/users-guide-emb/s-xen-configure-domu.html. My
patches assume that the VBDs are named accordingly.

Signed-off-by: Michael Renner <michael.renner@geizhals.at>
16 years agoxend: fix domain_migrate
Keir Fraser [Fri, 4 Sep 2009 07:34:45 +0000 (08:34 +0100)]
xend: fix domain_migrate

When the guest(pv-on-hvm guest that cannot suspend) reboot in
LiveMigration, the disconnecting of src-side is not transmitted to
dist-side. As a result, the error processing on the dist side is not
executed.

Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovt-d: fix Dom0 S3 resume.
Keir Fraser [Thu, 3 Sep 2009 08:51:37 +0000 (09:51 +0100)]
vt-d: fix Dom0 S3 resume.

When resuming from Dom0 S3, here 'irq' is -1, so we can't use it at
all. We should always use iommu->irq.

With the patch applied on the current tip 20153 and using the 2.6.18
Dom0, Dom0 S3 works fine (at least on my DQ35).

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agox86 vpt: Small performance fixes.
Keir Fraser [Thu, 3 Sep 2009 08:50:46 +0000 (09:50 +0100)]
x86 vpt: Small performance fixes.

1. once one-shot timer is fired, IRQ is raised repeatedly forever.
2. Test pending_intr_nr before pt_irq_masked(), as it is cheaper.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agoxm: Add "tap2" to attach blocktap disks to VM
Keir Fraser [Thu, 3 Sep 2009 08:49:41 +0000 (09:49 +0100)]
xm: Add "tap2" to attach blocktap disks to VM

I detected a problem when using XenAPI.  When I started a VM by
using xm create command, blocktap disks were not attached to the
VM.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: com devices's irqaction shouldn't free.
Keir Fraser [Thu, 3 Sep 2009 06:37:27 +0000 (07:37 +0100)]
x86: com devices's irqaction shouldn't free.

Since irqs of serial devices are initialized in early Xen and
its irqaction is not allocated from heap, so doesn't need free
in release irq logic.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years ago[IOMMU] dynamic VTd page table for HVM guest
Keir Fraser [Thu, 3 Sep 2009 06:29:29 +0000 (07:29 +0100)]
[IOMMU] dynamic VTd page table for HVM guest

This patch makes HVM's VTd page table dynamic just like what PV guest
does, so that avoid the overhead of maintaining page table until a PCI
device is truly assigned to the HVM guest.

Signed-Off-By: Zhai, Edwin <edwin.zhai@intel.com>
16 years agolibxenguest: Remove unused static inline function is_loadable_phdr()
Keir Fraser [Wed, 2 Sep 2009 15:15:05 +0000 (16:15 +0100)]
libxenguest: Remove unused static inline function is_loadable_phdr()

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoEnable some SCSI drivers in pvops kernel config
Keir Fraser [Wed, 2 Sep 2009 15:12:41 +0000 (16:12 +0100)]
Enable some SCSI drivers in pvops kernel config

Enables a couple of SCSI host controllers which are found in our test
farm but not enabled in the default upstream kernel.  The new drivers
are compiled as modules which is pretty harmless so this should be
safe.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agox86: Remove the redundant logic in set_msi_affinity
Keir Fraser [Wed, 2 Sep 2009 10:40:04 +0000 (11:40 +0100)]
x86: Remove the redundant logic in set_msi_affinity

Remove the redundant logic in set_msi_affinity. And it is introduced
accidently, maybe something wrong when I generated the patch.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxm: Make cpu_{cap|weight} available when using XenAPI
Keir Fraser [Wed, 2 Sep 2009 10:39:27 +0000 (11:39 +0100)]
xm: Make cpu_{cap|weight} available when using XenAPI

Currently, cpu_weight parameter and cpu_cap parameter in domain=20
configuration files are ignored when using XenAPI.
The parameters are available by this patch.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: rdtsc emulation (PV and HVM) must be monotonically increasing
Keir Fraser [Wed, 2 Sep 2009 10:39:02 +0000 (11:39 +0100)]
x86: rdtsc emulation (PV and HVM) must be monotonically increasing

The Intel SDM (section 18.10) clearly states that rdtsc
returns a "monotonically increasing unique value".
Current emulation code for rdtsc (both PV and HVM) returns
only a monotonically-non-decreasing (non-unique) value,
so ensure stale value is always incremented.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agopygrub: Match bare-metal GRUB behavior for passwords
Keir Fraser [Wed, 2 Sep 2009 10:38:24 +0000 (11:38 +0100)]
pygrub: Match bare-metal GRUB behavior for passwords

The password support patch already merged didn't match the bare-metal
GRUB behavior so I created a patch to match it. If password is entered
in grub.conf file, pressing `p` is required exactly like when using
"real" (bare-metal) GRUB. New options are available after the correct
password is entered.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agox86 hvm: remove pt_reset()
Keir Fraser [Tue, 1 Sep 2009 10:36:51 +0000 (11:36 +0100)]
x86 hvm: remove pt_reset()

Virtual platform timers are not sync'ed with guest's TSC any more
since c/s 17716. Thus pt_reset is now useless.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agox86 passthru:: graphics passthrough
Keir Fraser [Tue, 1 Sep 2009 10:36:16 +0000 (11:36 +0100)]
x86 passthru:: graphics passthrough

This patch supports basic gfx passthrough on xen side:
  - add a VGA type for gfx passthrough, and get the size of VGA bios
  of passthrouged gfx in hvmloader
  - add a config option 'gfx_passthru' for gfx passthrough

Signed-off-by: Ben Lin <ben.y.lin@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
16 years agox86: Make the hypercall PHYSDEVOP_alloc_irq_vector hypercall dummy.
Keir Fraser [Tue, 1 Sep 2009 10:34:31 +0000 (11:34 +0100)]
x86: Make the hypercall PHYSDEVOP_alloc_irq_vector hypercall dummy.

This patch tends to make the hypercall PHYSDEVOP_alloc_irq_vector
dummy, and defer vector allocation to programe ioapic entries by
dom0. Basically, dom0 shouldn't touch vector namespace which is only
used by hypervisor for servicing real device's interrupts. And this
patch also makes broken NetBSD dom0 work again.

Signed-off-by: Xiantao Zhang <xiantao.zhang.intel.com>
16 years ago[IA64] Further irq-vector fix.
Keir Fraser [Tue, 1 Sep 2009 10:32:47 +0000 (11:32 +0100)]
[IA64] Further irq-vector fix.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxend: Fix c/s 20137 -- do not redefine built-in name 'str'.
Keir Fraser [Mon, 31 Aug 2009 17:17:26 +0000 (18:17 +0100)]
xend: Fix c/s 20137 -- do not redefine built-in name 'str'.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 hvm: Clean up VLAPIC interfaces a little, and fix vlapic_ipi().
Keir Fraser [Mon, 31 Aug 2009 09:54:32 +0000 (10:54 +0100)]
x86 hvm: Clean up VLAPIC interfaces a little, and fix vlapic_ipi().

A boolean flag was overflowing a uint8_t.

Thanks to Dongxiao Xu at Intel for tracking down the bug.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years ago[IA64] Fix serial console freeze issue
Keir Fraser [Mon, 31 Aug 2009 09:17:09 +0000 (10:17 +0100)]
[IA64] Fix serial console freeze issue

20110:6e83b0ec2d70 is incomplete. irq_to_vector() is still required,
otherwise the serial console freezes without sync_console.

I confirmed that dom0 booted up without sync_console.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years agolibxc: Avoid a constant-zero-sized memset().
Keir Fraser [Mon, 31 Aug 2009 09:14:26 +0000 (10:14 +0100)]
libxc: Avoid a constant-zero-sized memset().

Some environments warn about this, which fails the build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Greater verbosity on domain creation failure
Keir Fraser [Mon, 31 Aug 2009 09:12:10 +0000 (10:12 +0100)]
xend: Greater verbosity on domain creation failure

Attached patch makes error reporting more verbose when
xc.domain_create() fails or raises an Exception.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agox86/numa: fix c/s 20120 (Fix SRAT check for discontig memory)
Keir Fraser [Mon, 31 Aug 2009 09:10:17 +0000 (10:10 +0100)]
x86/numa: fix c/s 20120 (Fix SRAT check for discontig memory)

That change converted the (wrong) assumption of contiguous nodes'
memory to a similarly wrong one of assuming discontiguous memory (i.e.
each node having separate E820 table entries). The code ought to be
able to deal with both, though, and I hope this change makes it so.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alex Williamson <alex.williamson@hp.com>
16 years agoproperly __initdata-annotate command line option string buffers
Keir Fraser [Mon, 31 Aug 2009 09:09:12 +0000 (10:09 +0100)]
properly __initdata-annotate command line option string buffers

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: properly __init-annotate time.c
Keir Fraser [Mon, 31 Aug 2009 09:08:38 +0000 (10:08 +0100)]
x86: properly __init-annotate time.c

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agointroduce size_param()
Keir Fraser [Mon, 31 Aug 2009 09:06:53 +0000 (10:06 +0100)]
introduce size_param()

With there being several instances of custom_param() where the handler
is just invoking parse_size_and_unit(), it seems to make sense to
introduce a simplifying abstraction.

Also fix serial_txbufsz not having been guaranteed to be a power of
two.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86_emulate: honor failure of in_longmode()
Keir Fraser [Mon, 31 Aug 2009 08:54:25 +0000 (09:54 +0100)]
x86_emulate: honor failure of in_longmode()

Failure of in_longmode() shouldn't be treated the same as the function
returning 'true'.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86, ept: remove execute permission for granted pages' P2M entries
Keir Fraser [Mon, 31 Aug 2009 08:51:45 +0000 (09:51 +0100)]
x86, ept: remove execute permission for granted pages' P2M entries

When backporting c/s 20026 I noticed that granted pages get execute
permission, which doesn't seem desirable (and has been avoided for PV
guests for quite a while).

Even for p2m_mmio_direct is seems suspicious to allow execution, but
me being less certain here I left it as is for the time being.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoAdjust non-default sized console ring allocation
Keir Fraser [Mon, 31 Aug 2009 08:51:05 +0000 (09:51 +0100)]
Adjust non-default sized console ring allocation

Using xmalloc() for objects that are guaranteed to be at least as
large as a page is wasteful, as it will always result in more (here:
double the amount) being allocated.

The other adjustments are more cosmetic:
- Updating conring and conring_size can be done so NMI/MCE generated
  messages don't use the new (larger) size with the old (smaller)
  buffer.
- The size printed can be in KiB (for the value to be easier to grasp)
  since it is always a multiple of the default of 16KiB.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: fix get_free_pirq
Keir Fraser [Mon, 31 Aug 2009 08:47:30 +0000 (09:47 +0100)]
x86: fix get_free_pirq

GSI should not be allocated for other purpose, so change
the hard code limit.

Also fix the out of loop checking, it should be '<' instead of
'=='.

Signed-off-by: Qing He <qing.he@intel.com>
16 years agox86: softtsc for PV domains
Keir Fraser [Thu, 27 Aug 2009 10:25:34 +0000 (11:25 +0100)]
x86: softtsc for PV domains

Implement softtsc (TSC emulation) for userland code in PV domains.  It
currently is tied to the existing "softtsc" Xen boot option (which
does the same thing but for HVM domains).  Later it should be tied to
a vm.cfg option, but this is sufficient for now to obtain performance
degradation data for PV environments that heavily utilize rdtsc.  To
record emulation frequency, use debug-key "s".

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: fix msi_free_irq().
Keir Fraser [Thu, 27 Aug 2009 09:13:13 +0000 (10:13 +0100)]
x86: fix msi_free_irq().

1) We should invoke destroy_irq() before msix_put_fixmap().
2) destroy_irq() invokes mask_msi_irq() eventually, so we can remove
the duplicate mask operation in the 'if' statement here.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years ago[HVM] add super page support for HVM migration
Keir Fraser [Thu, 27 Aug 2009 09:12:41 +0000 (10:12 +0100)]
[HVM] add super page support for HVM migration

This patch try to allocate 2M pages on target side based on analysis
of pfn sequence sent from source side for HVM migration.

The algorithm is: If pseudo-phys page is not yet populated in target
domain, AND it is first page of a 2MB extent, AND no other pages in
that extent are yet populated, AND the next pages in the save-image
stream populate that extent in order, THEN allocate a super page. If
the next 511 pages (to make the 2MB extent) are split across a batch
boundary, we have to optimistically allocate a super page in this
batch, and then break it into several 4K pages in the next batch,
which is speculative.

This patch is also friendly to PV guest migration.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
16 years agoxend: Do not pass pointer to a 16-bit domid_t to PyArg_ParseTuple()
Keir Fraser [Wed, 26 Aug 2009 14:41:59 +0000 (15:41 +0100)]
xend: Do not pass pointer to a 16-bit domid_t to PyArg_ParseTuple()
when it expects a full integer.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Flask MLS security label handling
Keir Fraser [Wed, 26 Aug 2009 14:35:14 +0000 (15:35 +0100)]
xend: Flask MLS security label handling

Changed the way security labels are handled to allow domains to be
labeled with Flask MLS security labels.  Changed the error message
generated when an invalid context is submitted to be more useful.

Signed-off-by: Machon B. Gregory <mbgrego@tycho.ncsc.mil>
Signed-off-by: George S. Coker, II <gscoker@alpha.ncsc.mil>
16 years agostubdom: Backport fix for SIZE_MAX from newlib 1.17.0
Keir Fraser [Tue, 25 Aug 2009 15:26:02 +0000 (16:26 +0100)]
stubdom: Backport fix for SIZE_MAX from newlib 1.17.0

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAccurate accounting for credit scheduler
Keir Fraser [Tue, 25 Aug 2009 14:36:37 +0000 (15:36 +0100)]
Accurate accounting for credit scheduler

Rather than debit a full 10ms of credit on a scheduler tick
(probabilistic), debit credits accurately based on time stamps.

The main problem this is meant to address is an attack on the
scheduler that allows a rogue guest to avoid ever being debited
credits.  The basic idea is that the rogue process checks time (using
rdtsc) periodically, and yields after 9.5ms.  Using this technique, a
guest can "steal" 95% of the cpu.  This is particularly an issue in
cloud environments.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoxend: Fix typos in configure_vtpm
Keir Fraser [Tue, 25 Aug 2009 13:59:09 +0000 (14:59 +0100)]
xend: Fix typos in configure_vtpm

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86 numa: Fix SRAT check for discontig memory
Keir Fraser [Tue, 25 Aug 2009 13:58:42 +0000 (14:58 +0100)]
x86 numa: Fix SRAT check for discontig memory

We currently compare the sum of the pages found in the SRAT table to
the address of the highest memory page found via the e820 table to
validate the SRAT.  This is completely bogus if there's any kind of
discontiguous memory, where the sum of the pages could be much smaller
than the address of the highest page.  I think all that's necessary is
to validate that each usable memory range in the e820 is covered by an
SRAT entry.  This might not be the most efficient way to do it, but
there are usually a relatively small number of entries on each side.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
16 years agoxen/xsm/flask: Fix Flask MLS context generation
Keir Fraser [Tue, 25 Aug 2009 13:58:07 +0000 (14:58 +0100)]
xen/xsm/flask:  Fix Flask MLS context generation

Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
16 years agopygrub: Set path in #! line of pygrub, too
Keir Fraser [Tue, 25 Aug 2009 13:57:45 +0000 (14:57 +0100)]
pygrub: Set path in #! line of pygrub, too

pygrub currently has a hardcoded path of /usr/bin/python which is not
correct if the version of python at install time is not the same as
that at build time.  This patch uses the existing install-wrap and
python/get-path machinery.

(It does not address the currently-existing bug that the get-path
machinery works by assuming that `python' is a symlink, rather than
querying the python interpreter for its version.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agoxend: Add support for URI ('file:' and 'data:' scheme) for PV/kernel
Keir Fraser [Tue, 25 Aug 2009 13:56:54 +0000 (14:56 +0100)]
xend: Add support for URI ('file:' and 'data:' scheme) for PV/kernel
and PV/ramdisk

Add support for 'file:' and 'data:' URI schemes for the parameters
'PV/kernel' and 'PV/ramdisk' in the VM.create() call. The 'data:'
scheme handling enables using a file which is stored inside the
management system (from where the XenAPI call is send) as kernel or
ramdisk.

Notes:
o all included: a detailed description can be found in the xenapi
documentation
o bumped up the version of the API document to 1.0.8 (because of
(minimal) interface extension)
o Future enhancements (like http:, ftp: schemes) fit seamlessly into
the current design / classes
o Unittest cases and xm-test case included

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agolibxc: More LZMA/BZIP fixes.
Keir Fraser [Mon, 24 Aug 2009 07:27:30 +0000 (08:27 +0100)]
libxc: More LZMA/BZIP fixes.

 - Fix an error message in xc_try_bzip2_decode()
 - Check library installation on demand using a Makefile function,
   rather than generating a dependency file. Cleaner and avoids a race
   on generating the dep file.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>